Economic Opportunity Gaps in Allegheny County: A Tract-Level Analysis¶
EDA/Data Visualization (90-800) Final Project¶
Sofia Hutton | Sdhutton | December 2025¶
AI Disclaimer¶
Figure layout and subplot structure adapted with assistance from ChatGPT (EDA/data visualization support). Heatmap normalization and hovertext pattern developed with help from ChatGPT. Mapbox choropleth configuration (geometry → geojson pattern) adapted with assistance from ChatGPT.
Abstract¶
This project analyzes tract-level opportunity gaps across Allegheny County using ACS (2022) and FFIEC (2022) data. By examining disparities in employment, education, infrastructure access, and housing stability, it identifies where economic conditions diverge most sharply between lower- and higher-income neighborhoods. The analysis is informed by the Regional Economic Connectivity framework from SRI/ICIC, focusing on how well people and places are linked to opportunity. The results highlight specific neighborhoods and indicators that warrant targeted, place-based policy attention.
1. Introduction & Motivation¶
Why This Project?¶
Economic mobility depends heavily on place. In Pittsburgh, it’s easy to see how neighborhood-level differences in employment, education, housing stability, and infrastructure shape people’s lived opportunities. This project uses ACS (2022), FFIEC (2022), and tract-level derived indicators to map those disparities across Allegheny County.
The goal is straightforward: identify where opportunity gaps are largest, understand what drives them, and highlight patterns that matter for equitable regional development. The approach mirrors a simplified version of SRI and ICIC’s Regional Economic Connectivity framework, which focuses on how well people, places, and systems are linked within a region.
Personal Connection¶
I came to this topic from both professional experience and personal curiosity. Before moving to Pittsburgh, I worked in Washington, D.C. on regional economic development projects—work that taught me how policy, infrastructure, and labor markets interact at the metropolitan level. But that work was always somewhat distant; I analyzed regions I didn’t actually live in.
Relocating to Pittsburgh shifted that. As a new resident, I noticed stark differences between neighborhoods and realized how little I understood about the local economic landscape beneath the city’s post-industrial “comeback” narrative. This project became a way to build that understanding using data rather than assumptions.
It also reconnects to the regional connectivity framework I worked with at SRI and ICIC: the idea that strong regions aren’t just prosperous—they are well-connected. When neighborhoods have reliable access to jobs, transportation, broadband, and stable housing, opportunity becomes more evenly distributed. When those connections break down, disparities widen.
By examining tract-level differences in employment, education, infrastructure, and housing, this analysis tries to capture where Pittsburgh’s regional connectivity is strong—and where it frays.
Policy Implications¶
A tract-level view helps identify which disparities matter most and where targeted interventions could have the biggest impact. If low-income tracts show strong educational attainment but weak digital access, broadband becomes a priority. If rent burden or vacancy rates cluster spatially, neighborhood stabilization strategies may be more effective than regionwide programs.
This kind of diagnostic supports place-based policymaking—interventions that respond to the actual conditions of specific communities rather than assuming every neighborhood faces the same barriers
Research Questions¶
- How do employment opportunities differ between low-income and high-income census tracts?
- What is the relationship between educational attainment and economic outcomes?
- Do infrastructure gaps (broadband, transit) correlate with income classification?
- How does housing affordability vary across income groups?
- Which census tracts face the most severe multi-dimensional opportunity deficits?
2. Dataset Description¶
Data Sources¶
2.1 American Community Survey (ACS) 5-Year Estimates (2018-2022)¶
- Source: U.S. Census Bureau via API
- Coverage: 84415 census tracts Nationwide
- Specific Focus: 402 census tracts in Allegheny County, PA
- Access: https://api.census.gov/data/2022/acs/acs5
- Variables: Labor market, education, housing, income, infrastructure, demographics
2.2 FFIEC Income Classification Data (2022)¶
- Source: Federal Financial Institutions Examination Council
- Purpose: Official income level classifications for Community Reinvestment Act
- Access: https://www.ffiec.gov/censusapp.htm
- Classifications:
- Low Income: < 50% of MSA median
- Moderate Income: 50-79% of MSA median
- Middle Income: 80-119% of MSA median
- Upper Income: ≥ 120% of MSA median
2.3 TIGER/Line Shapefiles - Pennsylvania Census Tracts (2022)¶
- Source: U.S. Census Bureau Geography Division
- Purpose: Census tract boundary geometries for spatial visualization and mapping
- Access: https://www.census.gov/geographies/mapping-files/time-series/geo/tiger-line-file.html
- Specific File: tl_2022_42_tract (Pennsylvania state FIPS code: 42)
- Format: Shapefile (.shp, .shx, .dbf, .prj)
- Coverage: All Pennsylvania census tracts; filtered to Allegheny County (FIPS county code: 003) for analysis
- Use: Enables choropleth mapping of income classification and compound disadvantage scores by census tract geography
2.4 (Inspiration) ICIC and SRI's Economic Connectivity Dashboard¶
- Source: Initiative for a Competitive Inner City (ICIC) / SRI International
- Purpose: Influenced analytical approach to characterizing regional success across income tiers
- Access: https://icic.shinyapps.io/economic_connectivity_dashboard/
3. Initial Hypotheses¶
H1: Low-income tracts will demonstrate significantly lower employment rates and substantially higher unemployment rates compared to middle/upper income areas, indicating systematic barriers to labor market participation.
H2: Educational attainment, particularly Bachelor's degree completion, will be markedly lower in low-income tracts, with the gap concentrated at the four-year degree level rather than distributed across all post-secondary credentials.
H3: Low-income tracts will experience significantly lower broadband access rates, creating a digital divide that limits access to remote work, online education, telehealth, and essential digital services.
H4: Low-income tracts will exhibit higher housing instability, evidenced by elevated vacancy rates and greater concentrations of rent-burdened households paying 30% or more of income toward housing costs.
H5: Disadvantage will cluster geographically and dimensionally—the same tracts facing low income will simultaneously struggle across employment, education, infrastructure, and housing, demonstrating that economic challenges compound rather than distribute randomly across the county.
4. Data Collection & Processing¶
The analysis follows these step:
- 4.1 System and Census API Configuration
- 4.2 Retrieve ACS Data (All States)
- 4.3 Construct Tract Identifiers
- 4.4 Merge FFIEC 2022 Tract Income Data
- 4.5 Compute Derived Indicators
- 4.6 Create Final Visualization Dataset
4.1 System and Census API Configuration¶
Defines Census variable groups for labor markets, education, housing, and infrastructure.
# Library configuaration
import pandas as pd
import numpy as np
import requests
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from matplotlib.patches import Patch
from matplotlib.gridspec import GridSpec
from scipy import stats
import geopandas as gpd
import warnings
from dotenv import load_dotenv
import os
load_dotenv()
import plotly.io as pio
import plotly.graph_objects as go
# Set the default renderer for Plotly figures
pio.renderers.default = 'notebook'
warnings.filterwarnings('ignore')
# Jupyter inline plotting
%matplotlib inline
# Plot styling
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette("Set2")
plt.rcParams['figure.dpi'] = 110
plt.rcParams['font.size'] = 10
print("✓ Libraries imported successfully!")
# Census API setup
API_KEY = os.getenv("CENSUS_API_KEY")
BASE_URL = "https://api.census.gov/data/2022/acs/acs5"
# Variables to collect
VARIABLES = {
"geographic": ["NAME"],
"labor_market": ["B23025_002E", "B23025_003E", "B23025_004E", "B23025_005E"],
"education": ["B15003_021E", "B15003_022E"],
"housing": ["B25003_001E", "B25003_002E", "B25070_007E", "B25070_008E",
"B25070_009E", "B25070_010E", "B25002_001E", "B25002_003E", "B19013_001E"],
"connectivity": ["B28002_002E"],
"transportation": ["B08301_010E"],
"population": ["B01003_001E"]
}
# Flatten for API call
var_list = [var for category in VARIABLES.values() for var in category]
var_string = ",".join(var_list)
print(f"✓ Configured {len(var_list)} Census variables")
✓ Libraries imported successfully! ✓ Configured 19 Census variables
4.2 Retrieve Census Data¶
Note: this step may take about one minute to run due to the size of the nationwide request.
ACS tract-level data is retrieved for all U.S. states, well exceeding the row-count requirement (83,531 observations). Allegheny County subsets are extracted downstream.
import requests
# Retrieve ACS tract-level data for all U.S. states
states = [
'01', '02', '04', '05', '06', '08', '09', '10', '11', '12',
'13', '15', '16', '17', '18', '19', '20', '21', '22', '23',
'24', '25', '26', '27', '28', '29', '30', '31', '32', '33',
'34', '35', '36', '37', '38', '39', '40', '41', '42', '44',
'45', '46', '47', '48', '49', '50', '51', '53', '54', '55', '56'
]
# Collect data for all tracts nationwide
all_data = []
for state in states:
params = {
"get": var_string,
"for": "tract:*",
"in": f"state:{state}",
"key": API_KEY,
}
response = requests.get(BASE_URL, params=params)
if response.status_code == 200:
rows = response.json()
# Append header only once
if not all_data:
all_data.extend(rows)
else:
all_data.extend(rows[1:])
else:
print(f"State {state}: request failed ({response.status_code})")
# Final dataset as list of rows
data = all_data
4.3 Build Skeleton Dataframe¶
Create standardized:
- 6-digit Census tract code
tract_6 - 11-digit GEOID
geoid = state + county + tract_6
These identifiers are required for merging with FFIEC (Federal Financial Institutions Examination Council) income classification data.
# Build DataFrame from API response
df = pd.DataFrame(data[1:], columns=data[0])
# Construct Census tract identifiers
df["tract_6"] = df["tract"].astype(str).str.zfill(6)
df["geoid"] = (
df["state"].astype(str).str.zfill(2) +
df["county"].astype(str).str.zfill(3) +
df["tract_6"]
)
# Convert applicable columns to numeric
exclude = ["NAME", "state", "county", "tract", "tract_6", "geoid"]
numeric_cols = [c for c in df.columns if c not in exclude]
for col in numeric_cols:
df[col] = pd.to_numeric(df[col], errors="coerce")
df.head()
| NAME | B23025_002E | B23025_003E | B23025_004E | B23025_005E | B15003_021E | B15003_022E | B25003_001E | B25003_002E | B25070_007E | ... | B25002_003E | B19013_001E | B28002_002E | B08301_010E | B01003_001E | state | county | tract | tract_6 | geoid | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Census Tract 201; Autauga County; Alabama | 738 | 732 | 713 | 19 | 84 | 182 | 700 | 519 | 0 | ... | 33 | 60563 | 643 | 0 | 1865 | 01 | 001 | 020100 | 020100 | 01001020100 |
| 1 | Census Tract 202; Autauga County; Alabama | 947 | 919 | 868 | 51 | 86 | 163 | 544 | 429 | 8 | ... | 136 | 57460 | 427 | 0 | 1861 | 01 | 001 | 020200 | 020200 | 01001020200 |
| 2 | Census Tract 203; Autauga County; Alabama | 1808 | 1781 | 1748 | 33 | 284 | 209 | 1305 | 912 | 9 | ... | 126 | 77371 | 1170 | 0 | 3492 | 01 | 001 | 020300 | 020300 | 01001020300 |
| 3 | Census Tract 204; Autauga County; Alabama | 1875 | 1854 | 1837 | 17 | 302 | 662 | 1666 | 1306 | 0 | ... | 56 | 73191 | 1563 | 53 | 3987 | 01 | 001 | 020400 | 020400 | 01001020400 |
| 4 | Census Tract 205.01; Autauga County; Alabama | 2504 | 2400 | 2386 | 14 | 319 | 763 | 1783 | 971 | 21 | ... | 74 | 79953 | 1759 | 72 | 4121 | 01 | 001 | 020501 | 020501 | 01001020501 |
5 rows × 24 columns
4.4 Merge FFIEC Income Data¶
FFIEC income variables categorizes each census tract as Low, Moderate, Middle, or Upper income based on the area's median family income relative to the MSA median. This classification enables analysis of opportunity gaps across income levels.
Specific fields (Tract MFI, % of AMI, and Income Level) are tracked in to the main dataframe using the geoid key.
# Load FFIEC 2022 income classification data
ffiec_path = "/Users/sofiahutton/Documents/Fall 2025 CMU Classes/visualizations with python /CensusTractList2022.xlsx"
# Read FFIEC tract sheet
ffiec = pd.read_excel(ffiec_path, sheet_name="2022 tracts")
ffiec.columns = ffiec.columns.str.strip()
# Build GEOID for merging
ffiec["geoid"] = ffiec["FIPS code"].astype(str).str.zfill(11)
# Identify income-related columns by partial match (handles naming variation)
mfi_col = [c for c in ffiec.columns if "mfi" in c.lower()][0]
pct_col = [c for c in ffiec.columns if "percentage" in c.lower()][0]
lvl_col = [c for c in ffiec.columns if "income level" in c.lower()][0]
# Standardize column names
ffiec_keep = ffiec[["geoid", mfi_col, pct_col, lvl_col]].rename(
columns={
mfi_col: "Tract MFI",
pct_col: "Tract income percentage",
lvl_col: "Tract income level",
}
)
# Remove any existing FFIEC columns to avoid duplicate/suffixed columns on re-run
cols_to_remove = [
"Tract MFI", "Tract income percentage", "Tract income level",
"Tract MFI_x", "Tract MFI_y",
"Tract income percentage_x", "Tract income percentage_y",
"Tract income level_x", "Tract income level_y"
]
df = df.drop(columns=[c for c in df.columns if c in cols_to_remove], errors="ignore")
# Merge FFIEC indicators into the ACS dataset
df = df.merge(ffiec_keep, on="geoid", how="left")
# Helper function for formatted summary
def print_income_summary(label, subset):
dist = subset["Tract income level"].value_counts()
total = dist.sum()
print(f"\n{label} Income Classification:")
for level in ["Upper", "Middle", "Moderate", "Low", "Unknown"]:
count = dist.get(level, 0)
share = count / total if total > 0 else 0
print(f"- {level:8s}: {count:5,} ({share:.1%})")
# Nation-level summary (all tracts)
print_income_summary("Nationwide", df)
# Pennsylvania-only summary (state FIPS 42)
df_pa = df[df["state"] == "42"]
print_income_summary("Pennsylvania (state = 42)", df_pa)
# Allegheny County summary (state 42, county 003)
df_allegheny = df[(df["state"] == "42") & (df["county"] == "003")]
print_income_summary("Allegheny County (state 42, county 003)", df_allegheny)
Nationwide Income Classification: - Upper : 22,302 (26.7%) - Middle : 34,720 (41.6%) - Moderate: 18,811 (22.5%) - Low : 5,430 (6.5%) - Unknown : 2,268 (2.7%) Pennsylvania (state = 42) Income Classification: - Upper : 813 (23.6%) - Middle : 1,630 (47.3%) - Moderate: 711 (20.6%) - Low : 206 (6.0%) - Unknown : 86 (2.5%) Allegheny County (state 42, county 003) Income Classification: - Upper : 115 (29.2%) - Middle : 137 (34.8%) - Moderate: 83 (21.1%) - Low : 38 (9.6%) - Unknown : 21 (5.3%)
4.5 Calculate Derived Metrics¶
Employment rates, vacancy, broadband access, rent burden, and other ratios are calculated.
# Core tract-level rates
df["employment_rate"] = df["B23025_004E"] / df["B23025_003E"]
df["unemployment_rate"] = df["B23025_005E"] / df["B23025_003E"]
df["homeownership_rate"] = df["B25003_002E"] / df["B25003_001E"]
df["vacancy_rate"] = df["B25002_003E"] / df["B25002_001E"]
df["broadband_rate"] = df["B28002_002E"] / df["B01003_001E"]
df["transit_rate"] = df["B08301_010E"] / df["B01003_001E"]
# Households paying ≥30% of income on rent
df["rent_burdened_count"] = (
df["B25070_007E"] +
df["B25070_008E"] +
df["B25070_009E"] +
df["B25070_010E"]
)
df.head()
| NAME | B23025_002E | B23025_003E | B23025_004E | B23025_005E | B15003_021E | B15003_022E | B25003_001E | B25003_002E | B25070_007E | ... | Tract MFI | Tract income percentage | Tract income level | employment_rate | unemployment_rate | homeownership_rate | vacancy_rate | broadband_rate | transit_rate | rent_burdened_count | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Census Tract 201; Autauga County; Alabama | 738 | 732 | 713 | 19 | 84 | 182 | 700 | 519 | 0 | ... | 68115.0 | 103.79 | Middle | 0.974044 | 0.025956 | 0.741429 | 0.045020 | 0.344772 | 0.000000 | 74 |
| 1 | Census Tract 202; Autauga County; Alabama | 947 | 919 | 868 | 51 | 86 | 163 | 544 | 429 | 8 | ... | 68115.0 | 73.60 | Moderate | 0.944505 | 0.055495 | 0.788603 | 0.200000 | 0.229447 | 0.000000 | 66 |
| 2 | Census Tract 203; Autauga County; Alabama | 1808 | 1781 | 1748 | 33 | 284 | 209 | 1305 | 912 | 9 | ... | 68115.0 | 102.93 | Middle | 0.981471 | 0.018529 | 0.698851 | 0.088050 | 0.335052 | 0.000000 | 190 |
| 3 | Census Tract 204; Autauga County; Alabama | 1875 | 1854 | 1837 | 17 | 302 | 662 | 1666 | 1306 | 0 | ... | 68115.0 | 110.95 | Middle | 0.990831 | 0.009169 | 0.783914 | 0.032520 | 0.392024 | 0.013293 | 8 |
| 4 | Census Tract 205.01; Autauga County; Alabama | 2504 | 2400 | 2386 | 14 | 319 | 763 | 1783 | 971 | 21 | ... | 68115.0 | 133.41 | Upper | 0.994167 | 0.005833 | 0.544588 | 0.039849 | 0.426838 | 0.017471 | 398 |
5 rows × 34 columns
4.6 Create Clean Output Table¶
A cleaned dataset (df_viz) is created for plotting:*
- Rates converted to percentages
- Rent burden components consolidated
- Outliers and negative values addressed
- Final feature set prepared for EDA and visualization
# Column mapping
output_columns = {
"tract_6": "Tract Code (6-digit)",
"NAME": "Tract Name",
"Tract MFI": "FFIEC Tract MFI (2022)",
"Tract income percentage": "FFIEC Tract income % (2022)",
"Tract income level": "FFIEC Tract income level (2022)",
"B01003_001E": "Total Population",
"B23025_003E": "Labor Force",
"B23025_004E": "Employed",
"B23025_005E": "Unemployed",
"employment_rate": "Employment Rate",
"unemployment_rate": "Unemployment Rate",
"B15003_021E": "Associates Degree",
"B15003_022E": "Bachelors or Higher",
"B25003_001E": "Total Housing Units",
"B25003_002E": "Owner-Occupied",
"homeownership_rate": "Homeownership Rate",
"B25002_003E": "Vacant Units",
"vacancy_rate": "Vacancy Rate",
"rent_burdened_count": "Rent Burdened (30%+)",
"B19013_001E": "Median Household Income",
"B28002_002E": "With Broadband",
"broadband_rate": "Broadband Rate",
"B08301_010E": "Public Transit Commuters",
"transit_rate": "Transit Rate",
}
# Create viz-ready dataframe
df_viz = df.copy()
df_viz = df_viz.rename(columns=output_columns)
# Drop the B23025_002E column (duplicate of Labor Force)
if 'B23025_002E' in df_viz.columns:
df_viz = df_viz.drop(columns=['B23025_002E'])
print("✓ Dropped duplicate B23025_002E column")
# Clean up the Tract Name to show only the 6-digit code
df_viz['Tract Name'] = df_viz['Tract Code (6-digit)']
# Additional derived variables
df_viz['Distress_Category'] = df_viz['FFIEC Tract income level (2022)'].apply(
lambda x: 'Low/Moderate Income' if x in ['Low', 'Moderate'] else 'Middle/Upper Income'
)
df_viz['Bachelors_Plus_Rate'] = (df_viz['Bachelors or Higher'] / df_viz['Total Population'] * 100)
df_viz['Rent_Burden_Rate'] = (df_viz['Rent Burdened (30%+)'] / df_viz['Total Housing Units'] * 100)
# Convert rates to percentages - From here on, all rate columns in df_viz are expressed as percentages (0–100), not proportions (0–1).
for col in ['Employment Rate', 'Unemployment Rate', 'Homeownership Rate',
'Vacancy Rate', 'Broadband Rate', 'Transit Rate']:
df_viz[col] = df_viz[col] * 100
# Drop the individual rent burden components since we have the total
rent_burden_components = ['B25070_007E', 'B25070_008E', 'B25070_009E', 'B25070_010E', 'B25002_001E']
df_viz = df_viz.drop(columns=rent_burden_components)
print(f"✓ Dropped {len(rent_burden_components)} rent burden component columns")
print("✓ Visualization dataset ready")
print(f"Shape: {df_viz.shape}")
df_viz.head()
✓ Dropped duplicate B23025_002E column ✓ Dropped 5 rent burden component columns ✓ Visualization dataset ready Shape: (84415, 31)
| Tract Name | Labor Force | Employed | Unemployed | Associates Degree | Bachelors or Higher | Total Housing Units | Owner-Occupied | Vacant Units | Median Household Income | ... | Employment Rate | Unemployment Rate | Homeownership Rate | Vacancy Rate | Broadband Rate | Transit Rate | Rent Burdened (30%+) | Distress_Category | Bachelors_Plus_Rate | Rent_Burden_Rate | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 020100 | 732 | 713 | 19 | 84 | 182 | 700 | 519 | 33 | 60563 | ... | 97.404372 | 2.595628 | 74.142857 | 4.502046 | 34.477212 | 0.000000 | 74 | Middle/Upper Income | 9.758713 | 10.571429 |
| 1 | 020200 | 919 | 868 | 51 | 86 | 163 | 544 | 429 | 136 | 57460 | ... | 94.450490 | 5.549510 | 78.860294 | 20.000000 | 22.944653 | 0.000000 | 66 | Low/Moderate Income | 8.758732 | 12.132353 |
| 2 | 020300 | 1781 | 1748 | 33 | 284 | 209 | 1305 | 912 | 126 | 77371 | ... | 98.147108 | 1.852892 | 69.885057 | 8.805031 | 33.505155 | 0.000000 | 190 | Middle/Upper Income | 5.985109 | 14.559387 |
| 3 | 020400 | 1854 | 1837 | 17 | 302 | 662 | 1666 | 1306 | 56 | 73191 | ... | 99.083064 | 0.916936 | 78.391357 | 3.252033 | 39.202408 | 1.329320 | 8 | Middle/Upper Income | 16.603963 | 0.480192 |
| 4 | 020501 | 2400 | 2386 | 14 | 319 | 763 | 1783 | 971 | 74 | 79953 | ... | 99.416667 | 0.583333 | 54.458777 | 3.984922 | 42.683815 | 1.747149 | 398 | Middle/Upper Income | 18.514924 | 22.321929 |
5 rows × 31 columns
5. Exploratory Data Analysis¶
5.1 Summary Statistics -¶
Purpose: establish a contextual baseline to clarify whether local disparities reflect uniquely Pittsburgh-specific challenges or broader national patterns.
This section provides a high-level comparison of tract-level socioeconomic conditions across three geographies:
- Allegheny County (local)
- Pennsylvania (state)
- Nationwide (benchmark)
For each region, the table reports:
- Aggregate population
- Average labor-market conditions (employment, unemployment)
- Educational attainment (BA+ rate)
- Housing stability indicators (homeownership, vacancy)
- Broadband access
- Median household income
# Create geographic filters and CLEAN the data
allegheny = df_viz[(df_viz['state'] == '42') & (df_viz['county'] == '003')].copy()
pennsylvania = df_viz[df_viz['state'] == '42'].copy()
nationwide = df_viz.copy()
# Clean Median Household Income (remove negative values and extreme outliers)
for df_geo in [allegheny, pennsylvania, nationwide]:
df_geo.loc[df_geo['Median Household Income'] <= 0, 'Median Household Income'] = np.nan
df_geo.loc[df_geo['Median Household Income'] > 500000, 'Median Household Income'] = np.nan
key_metrics = ['Total Population', 'Employment Rate', 'Unemployment Rate',
'Bachelors_Plus_Rate', 'Median Household Income', 'Broadband Rate',
'Homeownership Rate', 'Vacancy Rate']
print("="*80)
print("SUMMARY STATISTICS: ALLEGHENY vs PENNSYLVANIA vs NATIONWIDE")
print("="*80)
# Create comparison table
summary_comparison = pd.DataFrame({
'Allegheny County': [
allegheny['Total Population'].sum(),
allegheny['Employment Rate'].mean(),
allegheny['Unemployment Rate'].mean(),
allegheny['Bachelors_Plus_Rate'].mean(),
allegheny['Median Household Income'].mean(),
allegheny['Broadband Rate'].mean(),
allegheny['Homeownership Rate'].mean(),
allegheny['Vacancy Rate'].mean()
],
'Pennsylvania': [
pennsylvania['Total Population'].sum(),
pennsylvania['Employment Rate'].mean(),
pennsylvania['Unemployment Rate'].mean(),
pennsylvania['Bachelors_Plus_Rate'].mean(),
pennsylvania['Median Household Income'].mean(),
pennsylvania['Broadband Rate'].mean(),
pennsylvania['Homeownership Rate'].mean(),
pennsylvania['Vacancy Rate'].mean()
],
'Nationwide': [
nationwide['Total Population'].sum(),
nationwide['Employment Rate'].mean(),
nationwide['Unemployment Rate'].mean(),
nationwide['Bachelors_Plus_Rate'].mean(),
nationwide['Median Household Income'].mean(),
nationwide['Broadband Rate'].mean(),
nationwide['Homeownership Rate'].mean(),
nationwide['Vacancy Rate'].mean()
]
}, index=key_metrics)
print(f"\n{'Geography':<25} {'Allegheny':<20} {'Pennsylvania':<20} {'Nationwide':<20}")
print("-" * 90)
print(f"{'Census Tracts':<25} {len(allegheny):<20,} {len(pennsylvania):<20,} {len(nationwide):<20,}")
print("\nValues (Total Population = SUM, others = MEAN):")
print("-" * 90)
# Format each row nicely
for metric in key_metrics:
allegheny_val = summary_comparison.loc[metric, 'Allegheny County']
pa_val = summary_comparison.loc[metric, 'Pennsylvania']
nation_val = summary_comparison.loc[metric, 'Nationwide']
if metric == 'Total Population':
print(f"{metric:<25} {allegheny_val:>20,.0f} {pa_val:>20,.0f} {nation_val:>20,.0f}")
elif metric == 'Median Household Income':
print(f"{metric:<25} ${allegheny_val:>19,.2f} ${pa_val:>19,.2f} ${nation_val:>19,.2f}")
else:
print(f"{metric:<25} {allegheny_val:>20.2f} {pa_val:>20.2f} {nation_val:>20.2f}")
# Data quality note
print("\n" + "="*90)
print("DATA QUALITY NOTE:")
print(f" Allegheny: {allegheny['Median Household Income'].isna().sum()} tracts with missing/invalid income data")
print(f" Pennsylvania: {pennsylvania['Median Household Income'].isna().sum()} tracts with missing/invalid income data")
print(f" Nationwide: {nationwide['Median Household Income'].isna().sum()} tracts with missing/invalid income data")
================================================================================ SUMMARY STATISTICS: ALLEGHENY vs PENNSYLVANIA vs NATIONWIDE ================================================================================ Geography Allegheny Pennsylvania Nationwide ------------------------------------------------------------------------------------------ Census Tracts 394 3,446 84,415 Values (Total Population = SUM, others = MEAN): ------------------------------------------------------------------------------------------ Total Population 1,245,310 12,989,208 331,097,593 Employment Rate 94.42 94.31 94.36 Unemployment Rate 5.58 5.69 5.64 Bachelors_Plus_Rate 17.60 13.92 14.07 Median Household Income $ 75,812.23 $ 77,527.23 $ 80,716.70 Broadband Rate 39.81 35.45 34.06 Homeownership Rate 62.99 68.40 64.77 Vacancy Rate 10.20 9.94 10.73 ========================================================================================== DATA QUALITY NOTE: Allegheny: 13 tracts with missing/invalid income data Pennsylvania: 63 tracts with missing/invalid income data Nationwide: 1517 tracts with missing/invalid income data
5.1 Key Patterns Observed¶
• Allegheny County’s employment and unemployment rates closely track statewide and national averages, suggesting broadly similar labor-market conditions.
• Educational attainment and broadband access are slightly stronger in Allegheny than in Pennsylvania or the U.S. overall.
• Homeownership rates are noticeably lower in Allegheny, reflecting the county’s older rental housing stock and more urban development patterns.
• Vacancy rates are similar to the national average, though slightly higher than those for Pennsylvania as a whole.
• Missing income values appear across all geographies, but Allegheny has relatively few tracts with incomplete data (13 total).
5.2 Income Distribution Profiles: Allegheny vs. Pennsylvania vs. Nationwide¶
Using FFIEC income categories (Low, Moderate, Middle, Upper), this section compares how demographic and economic conditions vary within each region by income group.
For each income tier, the notebook reports:
- Total population
- Employment and unemployment rates
- Educational attainment (BA+ share)
- Median household income
- Broadband access
- Homeownership and vacancy rates
- Sample sizes (tract counts)
print("\n" + "="*90)
print("INCOME LEVEL BREAKDOWN: ALLEGHENY vs PENNSYLVANIA vs NATIONWIDE")
print("="*90)
# Create comparison for each geography
for geo_name, geo_data in [('Allegheny County', allegheny),
('Pennsylvania', pennsylvania),
('Nationwide', nationwide)]:
print(f"\n{geo_name.upper()}")
print("="*90)
# Group by income level
grouped = geo_data.groupby('FFIEC Tract income level (2022)')
# Calculate: SUM for Total Population, MEAN for everything else
income_breakdown = grouped.agg({
'Total Population': 'sum',
'Employment Rate': 'mean',
'Unemployment Rate': 'mean',
'Bachelors_Plus_Rate': 'mean',
'Median Household Income': 'mean',
'Broadband Rate': 'mean',
'Homeownership Rate': 'mean',
'Vacancy Rate': 'mean'
})
# Format and display
print(f"\n{'Metric':<30} {'Low':<15} {'Moderate':<15} {'Middle':<15} {'Upper':<15}")
print("-"*90)
for metric in key_metrics:
if metric == 'Total Population':
print(f"{metric:<30}", end="")
for income_level in ['Low', 'Moderate', 'Middle', 'Upper']:
if income_level in income_breakdown.index:
val = income_breakdown.loc[income_level, metric]
print(f"{val:>14,.0f} ", end="")
else:
print(f"{'N/A':>14} ", end="")
print()
elif metric == 'Median Household Income':
print(f"{metric:<30}", end="")
for income_level in ['Low', 'Moderate', 'Middle', 'Upper']:
if income_level in income_breakdown.index:
val = income_breakdown.loc[income_level, metric]
print(f"${val:>13,.0f} ", end="")
else:
print(f"{'N/A':>14} ", end="")
print()
else:
print(f"{metric:<30}", end="")
for income_level in ['Low', 'Moderate', 'Middle', 'Upper']:
if income_level in income_breakdown.index:
val = income_breakdown.loc[income_level, metric]
print(f"{val:>14.2f} ", end="")
else:
print(f"{'N/A':>14} ", end="")
print()
# Sample sizes
print("\n" + "-"*90)
print("Sample Sizes (number of census tracts):")
tract_counts = geo_data['FFIEC Tract income level (2022)'].value_counts()
print(f"{'Low:':<15} {tract_counts.get('Low', 0):>6,} | ", end="")
print(f"{'Moderate:':<15} {tract_counts.get('Moderate', 0):>6,} | ", end="")
print(f"{'Middle:':<15} {tract_counts.get('Middle', 0):>6,} | ", end="")
print(f"{'Upper:':<15} {tract_counts.get('Upper', 0):>6,}")
========================================================================================== INCOME LEVEL BREAKDOWN: ALLEGHENY vs PENNSYLVANIA vs NATIONWIDE ========================================================================================== ALLEGHENY COUNTY ========================================================================================== Metric Low Moderate Middle Upper ------------------------------------------------------------------------------------------ Total Population 91,223 219,912 437,560 479,547 Employment Rate 89.25 92.55 95.30 96.72 Unemployment Rate 10.75 7.45 4.70 3.28 Bachelors_Plus_Rate 7.51 12.87 18.87 23.26 Median Household Income $ 33,979 $ 49,919 $ 72,018 $ 114,459 Broadband Rate 33.57 40.04 41.72 39.32 Homeownership Rate 34.57 55.04 67.35 75.52 Vacancy Rate 19.74 13.67 8.01 6.46 ------------------------------------------------------------------------------------------ Sample Sizes (number of census tracts): Low: 38 | Moderate: 83 | Middle: 137 | Upper: 115 PENNSYLVANIA ========================================================================================== Metric Low Moderate Middle Upper ------------------------------------------------------------------------------------------ Total Population 683,935 2,456,728 6,341,164 3,375,388 Employment Rate 88.66 92.35 95.12 95.95 Unemployment Rate 11.34 7.65 4.88 4.05 Bachelors_Plus_Rate 6.34 9.92 13.44 20.70 Median Household Income $ 36,741 $ 54,639 $ 75,394 $ 113,158 Broadband Rate 31.73 34.92 35.59 37.41 Homeownership Rate 36.68 56.63 74.12 76.89 Vacancy Rate 13.83 12.39 9.94 6.38 ------------------------------------------------------------------------------------------ Sample Sizes (number of census tracts): Low: 206 | Moderate: 711 | Middle: 1,630 | Upper: 813 NATIONWIDE ========================================================================================== Metric Low Moderate Middle Upper ------------------------------------------------------------------------------------------ Total Population 18,412,899 71,654,894 138,968,321 95,017,217 Employment Rate 89.52 92.95 95.01 95.91 Unemployment Rate 10.48 7.05 4.99 4.09 Bachelors_Plus_Rate 6.70 9.57 13.46 20.76 Median Household Income $ 39,011 $ 56,105 $ 76,431 $ 118,666 Broadband Rate 30.17 32.38 34.42 36.22 Homeownership Rate 31.72 52.88 70.10 76.12 Vacancy Rate 13.30 11.47 11.00 8.76 ------------------------------------------------------------------------------------------ Sample Sizes (number of census tracts): Low: 5,430 | Moderate: 18,811 | Middle: 34,720 | Upper: 22,302
5.2 Key Patterns: Income Distribution Profiles¶
• Across Allegheny County, Pennsylvania, and the U.S., socioeconomic conditions vary sharply by income tier, with low-income tracts consistently showing weaker outcomes across all indicators.
• Low-income tracts exhibit substantially lower educational attainment, which tracks closely with lower employment and higher unemployment in every geography.
• Upper-income tracts in Allegheny County have exceptionally high BA+ rates—even higher than statewide and national averages—highlighting the region’s strong concentration of educated neighborhoods.
• Homeownership follows the steepest income gradient, with gaps of 20–40 percentage points between low-income and upper-income tracts across all regions.
• Vacancy rates are significantly higher in low-income tracts, especially in Allegheny County, indicating localized housing distress.
• Broadband access improves steadily with income, though the gap is smaller compared to education or housing indicators.
5.3 Opportunity Gaps: Lower-Income vs. Higher-Income Tracts¶
To highlight structural inequities more clearly, this section collapses the FFIEC categories into:
- Low/Moderate Income Tracts
- Middle/Upper Income Tracts
For each geography (Allegheny, Pennsylvania, U.S.), it reports:
- Employment rate & gap
- Unemployment rate & gap
- Broadband access & gap
- Homeownership & gap
- Vacancy & gap
- Transit use differences
- Corresponding population and tract counts
print("\n" + "="*90)
print("OPPORTUNITY GAPS: LOW/MODERATE vs MIDDLE/UPPER INCOME")
print("Comparison across Allegheny County, Pennsylvania, and Nationwide")
print("="*90)
gap_metrics = ['Employment Rate', 'Unemployment Rate', 'Broadband Rate',
'Homeownership Rate', 'Vacancy Rate', 'Transit Rate']
# Calculate gaps for each geography
results = []
for geo_name, geo_data in [('Allegheny', allegheny),
('Pennsylvania', pennsylvania),
('Nationwide', nationwide)]:
low_mod = geo_data[geo_data['Distress_Category'] == 'Low/Moderate Income'][gap_metrics].mean()
mid_upper = geo_data[geo_data['Distress_Category'] == 'Middle/Upper Income'][gap_metrics].mean()
gap = mid_upper - low_mod
# Get population and tract counts
low_mod_pop = geo_data[geo_data['Distress_Category'] == 'Low/Moderate Income']['Total Population'].sum()
mid_upper_pop = geo_data[geo_data['Distress_Category'] == 'Middle/Upper Income']['Total Population'].sum()
low_mod_tracts = len(geo_data[geo_data['Distress_Category'] == 'Low/Moderate Income'])
mid_upper_tracts = len(geo_data[geo_data['Distress_Category'] == 'Middle/Upper Income'])
results.append({
'Geography': geo_name,
'Low/Mod_n': low_mod_tracts,
'Mid/Upper_n': mid_upper_tracts,
'Low/Mod_pop': low_mod_pop,
'Mid/Upper_pop': mid_upper_pop,
**{f'{metric}_LowMod': low_mod[metric] for metric in gap_metrics},
**{f'{metric}_MidUpper': mid_upper[metric] for metric in gap_metrics},
**{f'{metric}_Gap': gap[metric] for metric in gap_metrics}
})
# Display in organized format
for metric in gap_metrics:
print(f"\n{metric.upper()}")
print("-"*90)
print(f"{'Geography':<15} {'Low/Moderate':<15} {'Middle/Upper':<15} {'Gap':<15} {'% Difference':<15}")
print("-"*90)
for result in results:
low_mod_val = result[f'{metric}_LowMod']
mid_upper_val = result[f'{metric}_MidUpper']
gap_val = result[f'{metric}_Gap']
pct_diff = (gap_val / low_mod_val * 100) if low_mod_val != 0 else 0
print(f"{result['Geography']:<15} {low_mod_val:>14.2f} {mid_upper_val:>14.2f} {gap_val:>14.2f} {pct_diff:>14.1f}%")
# Sample sizes with population totals
print("\n" + "="*90)
print("SAMPLE SIZES & POPULATIONS")
print("="*90)
print(f"{'Geography':<15} {'Category':<20} {'Tracts':<15} {'Total Population':<20}")
print("-"*90)
for result in results:
print(f"{result['Geography']:<15} {'Low/Moderate Income':<20} {result['Low/Mod_n']:>14,} {result['Low/Mod_pop']:>19,.0f}")
print(f"{'':15} {'Middle/Upper Income':<20} {result['Mid/Upper_n']:>14,} {result['Mid/Upper_pop']:>19,.0f}")
print()
========================================================================================== OPPORTUNITY GAPS: LOW/MODERATE vs MIDDLE/UPPER INCOME Comparison across Allegheny County, Pennsylvania, and Nationwide ==========================================================================================
EMPLOYMENT RATE
------------------------------------------------------------------------------------------
Geography Low/Moderate Middle/Upper Gap % Difference
------------------------------------------------------------------------------------------
Allegheny 91.51 95.76 4.25 4.6%
Pennsylvania 91.52 95.34 3.81 4.2%
Nationwide 92.18 95.25 3.07 3.3%
UNEMPLOYMENT RATE
------------------------------------------------------------------------------------------
Geography Low/Moderate Middle/Upper Gap % Difference
------------------------------------------------------------------------------------------
Allegheny 8.49 4.24 -4.25 -50.0%
Pennsylvania 8.48 4.66 -3.81 -45.0%
Nationwide 7.82 4.75 -3.07 -39.3%
BROADBAND RATE
------------------------------------------------------------------------------------------
Geography Low/Moderate Middle/Upper Gap % Difference
------------------------------------------------------------------------------------------
Allegheny 38.00 40.64 2.64 6.9%
Pennsylvania 34.20 35.91 1.71 5.0%
Nationwide 31.89 34.95 3.06 9.6%
HOMEOWNERSHIP RATE
------------------------------------------------------------------------------------------
Geography Low/Moderate Middle/Upper Gap % Difference
------------------------------------------------------------------------------------------
Allegheny 48.61 69.63 21.01 43.2%
Pennsylvania 52.15 74.40 22.25 42.7%
Nationwide 48.14 71.59 23.45 48.7%
VACANCY RATE
------------------------------------------------------------------------------------------
Geography Low/Moderate Middle/Upper Gap % Difference
------------------------------------------------------------------------------------------
Allegheny 15.58 7.72 -7.86 -50.5%
Pennsylvania 12.71 8.91 -3.80 -29.9%
Nationwide 11.88 10.26 -1.62 -13.6%
TRANSIT RATE
------------------------------------------------------------------------------------------
Geography Low/Moderate Middle/Upper Gap % Difference
------------------------------------------------------------------------------------------
Allegheny 6.02 3.13 -2.89 -48.1%
Pennsylvania 3.49 1.67 -1.82 -52.1%
Nationwide 2.64 1.54 -1.10 -41.7%
==========================================================================================
SAMPLE SIZES & POPULATIONS
==========================================================================================
Geography Category Tracts Total Population
------------------------------------------------------------------------------------------
Allegheny Low/Moderate Income 121 311,135
Middle/Upper Income 273 934,175
Pennsylvania Low/Moderate Income 917 3,140,663
Middle/Upper Income 2,529 9,848,545
Nationwide Low/Moderate Income 24,241 90,067,793
Middle/Upper Income 60,174 241,029,800
5.3 Key Patterns: Opportunity Gaps Between Lower-Income and Higher-Income Tracts¶
• Employment and unemployment gaps are large and persistent, with low/moderate-income tracts facing unemployment rates roughly 40–50% higher than middle/upper-income tracts at all geographic levels.
• Homeownership shows the widest disparity, with higher-income tracts maintaining ownership rates 20–23 percentage points above lower-income areas—evidence of deep structural divides in wealth and housing stability.
• Vacancy gaps are particularly striking in Allegheny County, where lower-income tracts have vacancy rates more than 50% higher, signaling concentrated neighborhood distress.
• Transit dependence is substantially higher in lower-income tracts, especially in Allegheny, suggesting differences in car access rather than transit availability.
• Broadband divides persist, but the magnitude of these gaps is smaller than those for housing or labor-market outcomes.
• Overall, Allegheny’s opportunity gaps mirror statewide and national patterns, though disparities in vacancy and transportation appear somewhat sharper locally.
6. Visualizations¶
The following visualizations explore economic opportunity indicators across Allegheny County census tracts.
6.1 Opportunity Gaps Across Income Levels¶
• Description: Horizontal grouped bar chart comparing percentage point differences between Middle/Upper and Low/Moderate income tracts across six key metrics (Employment Rate, Unemployment Rate, Broadband Access, Homeownership Rate, Vacancy Rate, Bachelor's Degree+).
• Objective: Establish the magnitude of opportunity gaps and demonstrate that disparities exist across multiple dimensions, not just one or two isolated metrics.Methodology: Calculate mean values for each metric within Low/Moderate and Middle/Upper income groups for three geographies (Allegheny County, Pennsylvania, Nationwide).
• Methodology: Display percentage point differences with color-coded bars by geography.
# 6.1 Opportunity gaps across income levels
import plotly.graph_objects as go
import plotly.express as px # kept in case used elsewhere
# Metrics to compare between low/moderate- and middle/upper-income tracts
gap_metrics = [
"Employment Rate",
"Unemployment Rate",
"Broadband Rate",
"Homeownership Rate",
"Vacancy Rate",
"Bachelors_Plus_Rate",
]
gap_data = []
# Compute percentage differences by geography
for geo_name, geo_data in [
("Allegheny County", allegheny),
("Pennsylvania", pennsylvania),
("Nationwide", nationwide),
]:
low_mod = geo_data[geo_data["Distress_Category"] == "Low/Moderate Income"][gap_metrics].mean()
mid_upper = geo_data[geo_data["Distress_Category"] == "Middle/Upper Income"][gap_metrics].mean()
for metric in gap_metrics:
gap = mid_upper[metric] - low_mod[metric]
pct_diff = (gap / low_mod[metric] * 100) if low_mod[metric] != 0 else 0
gap_data.append(
{
"Geography": geo_name,
"Metric": metric,
"Gap": gap,
"Percent_Diff": pct_diff,
}
)
gap_df = pd.DataFrame(gap_data)
# Colors by geography
colors = {
"Allegheny County": "#1f77b4", # blue
"Pennsylvania": "#9467bd", # purple
"Nationwide": "#ff7f0e", # orange
}
fig = go.Figure()
for geo in ["Allegheny County", "Pennsylvania", "Nationwide"]:
geo_data = gap_df[gap_df["Geography"] == geo]
fig.add_trace(
go.Bar(
y=geo_data["Metric"],
x=geo_data["Percent_Diff"],
name=geo,
orientation="h",
marker=dict(color=colors[geo], opacity=0.8),
text=geo_data["Percent_Diff"].round(1).astype(str) + "%",
textposition="outside",
textfont=dict(size=11, family="Arial, sans-serif"),
hovertemplate=(
"<b>%{y}</b><br>"
f"<b>{geo}</b><br>"
"Gap: %{x:.1f}%<br>"
"<extra></extra>"
),
)
)
fig.update_layout(
title=dict(
text="Opportunity Gaps Across Income Levels",
x=0.5,
xanchor="center",
font=dict(size=24, family="Arial, sans-serif", color="#2c3e50"),
),
annotations=[
dict(
text="Percentage difference: Middle/Upper Income minus Low/Moderate Income tracts",
x=0.5,
y=-0.15,
xref="paper",
yref="paper",
xanchor="center",
yanchor="top",
showarrow=False,
font=dict(size=12, color="#7f8c8d", family="Arial, sans-serif"),
)
],
xaxis=dict(
title="Percentage Point Difference (%)",
title_font=dict(size=14, family="Arial, sans-serif"),
gridcolor="#ecf0f1",
tickfont=dict(size=12, family="Arial, sans-serif"),
),
yaxis=dict(
title="",
tickfont=dict(size=13, family="Arial, sans-serif"),
),
barmode="group",
height=550,
width=1100,
template="plotly_white",
plot_bgcolor="white",
paper_bgcolor="white",
legend=dict(
orientation="h",
yanchor="top",
y=1.12,
xanchor="center",
x=0.5,
font=dict(size=13, family="Arial, sans-serif"),
bgcolor="rgba(255,255,255,0.8)",
bordercolor="#bdc3c7",
borderwidth=1,
),
font=dict(family="Arial, sans-serif"),
margin=dict(t=120, b=100, l=150, r=80),
)
fig.show()
Interpretation: Across every dimension measured, middle/upper income tracts significantly outperform low/moderate income areas. Two gaps stand out as particularly severe: unemployment and education.
The unemployment disparity is stark—low/moderate income tracts experience unemployment rates 39-50 percentage points higher than their middle/upper income counterparts. This means disadvantaged areas face roughly double-digit unemployment while affluent areas hover around 3-5%, representing a fundamental difference in labor market access and economic stability.
Education shows an equally dramatic divide. Middle/upper income tracts have 72-84 percentage points more residents with Bachelor's degrees, revealing that higher education remains concentrated in already-privileged communities. Homeownership follows a similar pattern with 42-48 point gaps, reflecting both affordability barriers and limited wealth-building opportunities in lower-income neighborhoods.
Even infrastructure shows meaningful disparities. Broadband access lags by 5-10 percentage points in disadvantaged areas, limiting access to remote work, online education, and digital services. Employment gaps (3-4 points) appear smaller but still indicate persistent barriers to full labor force participation.
What's striking is the consistency: Allegheny County mirrors both Pennsylvania and nationwide patterns almost exactly. This suggests we're seeing systemic inequities, not local anomalies—and that addressing them will require more than isolated, place-specific interventions.
6.2 Economic Opportunity Scorecard by Geography and Income Level¶
• Description: Heatmap displaying mean values for seven economic indicators across four income levels (Low, Moderate, Middle, Upper) and three geographies (Allegheny, Pennsylvania, Nationwide).
• Objective: Provide a comprehensive, at-a-glance comparison showing how outcomes vary simultaneously across income levels and geographic scales.
• Methodology: Calculate mean values for each metric by income level and geography. Normalize color scale so blue = better outcomes and red = worse outcomes across all metrics (inverting Unemployment and Vacancy rates). Display values in cells with color intensity representing relative performance.
# 6.2 Economic opportunity scorecard heatmap
import plotly.graph_objects as go
import numpy as np
# Metrics to include in the heatmap
metrics_for_heatmap = [
"Employment Rate",
"Unemployment Rate",
"Bachelors_Plus_Rate",
"Median Household Income",
"Broadband Rate",
"Homeownership Rate",
"Vacancy Rate",
]
heatmap_data = []
# Aggregate mean values by geography and FFIEC income level
for geo_name, geo_data in [
("Allegheny County", allegheny),
("Pennsylvania", pennsylvania),
("Nationwide", nationwide),
]:
for income_level in ["Low", "Moderate", "Middle", "Upper"]:
subset = geo_data[geo_data["FFIEC Tract income level (2022)"] == income_level]
if len(subset) == 0:
continue
row_data = {
"Geography": geo_name,
"Income Level": income_level,
}
for metric in metrics_for_heatmap:
if metric == "Median Household Income":
# Convert to thousands for display
row_data[metric] = subset[metric].mean() / 1000
else:
row_data[metric] = subset[metric].mean()
heatmap_data.append(row_data)
heatmap_df = pd.DataFrame(heatmap_data)
# Row labels (geography + income tier)
heatmap_df["Label"] = heatmap_df["Geography"] + " - " + heatmap_df["Income Level"]
# Normalize all metrics to a 0–100 scale where higher = better outcome
z_data_normalized = []
display_values = []
for metric in metrics_for_heatmap:
col_data = heatmap_df[metric].values
min_val, max_val = col_data.min(), col_data.max()
if max_val == min_val:
# Avoid divide-by-zero; flat metric across all groups
normalized = np.full_like(col_data, 50, dtype=float)
else:
if metric in ["Unemployment Rate", "Vacancy Rate"]:
# Lower is better: invert so high normalized = better outcome
normalized = 100 - ((col_data - min_val) / (max_val - min_val) * 100)
else:
# Higher is better
normalized = (col_data - min_val) / (max_val - min_val) * 100
z_data_normalized.append(normalized)
display_values.append(col_data)
z_data_normalized = np.array(z_data_normalized).T # shape: (rows, metrics)
display_values = np.array(display_values).T
# Axis labels
y_labels = heatmap_df["Label"].values
x_labels = [
"Employment<br>Rate (%)",
"Unemployment<br>Rate (%)",
"Bachelor's<br>Degree+ (%)",
"Median HH<br>Income ($K)",
"Broadband<br>Access (%)",
"Homeownership<br>Rate (%)",
"Vacancy<br>Rate (%)",
]
# Custom hover text with original values
hover_text = []
for i, row in heatmap_df.iterrows():
row_hover = []
for j, metric in enumerate(metrics_for_heatmap):
val = display_values[i][j]
if metric == "Median Household Income":
formatted_val = f"${val:.1f}K"
else:
formatted_val = f"{val:.1f}%"
row_hover.append(
f"<b>{metric}</b><br>{formatted_val}<br><b>{row['Label']}</b>"
)
hover_text.append(row_hover)
# Heatmap: colors reflect normalized scores, text shows original metric values
fig = go.Figure(
data=go.Heatmap(
z=z_data_normalized,
x=x_labels,
y=y_labels,
colorscale="RdYlBu_r", # red = weaker outcomes, blue = stronger
text=np.round(display_values, 1),
texttemplate="%{text}",
textfont={"size": 10},
hovertext=hover_text,
hoverinfo="text",
colorbar=dict(
title="Outcome<br>Quality",
titleside="right",
tickmode="array",
tickvals=[0, 50, 100],
ticktext=["Worse", "Average", "Better"],
tickfont=dict(size=11, family="Arial, sans-serif"),
titlefont=dict(size=12, family="Arial, sans-serif"),
),
)
)
# Horizontal separators between income groups (assuming 4 rows per geography)
fig.add_shape(
type="line",
x0=-0.5,
x1=len(x_labels) - 0.5,
y0=3.5,
y1=3.5,
line=dict(color="white", width=3),
)
fig.add_shape(
type="line",
x0=-0.5,
x1=len(x_labels) - 0.5,
y0=7.5,
y1=7.5,
line=dict(color="white", width=3),
)
fig.update_layout(
title=dict(
text="Economic Opportunity Scorecard by Geography and Income Level",
x=0.5,
xanchor="center",
font=dict(size=24, family="Arial, sans-serif", color="#2c3e50"),
),
annotations=[
dict(
text=(
"Mean values across census tracts | "
"Blue = better outcomes, Red = worse outcomes (normalized across all metrics)"
),
x=0.5,
y=-0.12,
xref="paper",
yref="paper",
xanchor="center",
yanchor="top",
showarrow=False,
font=dict(size=11, color="#7f8c8d", family="Arial, sans-serif"),
)
],
xaxis=dict(
title="",
side="bottom",
tickfont=dict(size=11, family="Arial, sans-serif"),
),
yaxis=dict(
title="",
tickfont=dict(size=11, family="Arial, sans-serif"),
),
height=700,
width=1100,
template="plotly_white",
margin=dict(t=100, b=100, l=200, r=150),
)
fig.show()
Interpretation:
The heatmap reveals a stark pattern: low-income tracts (bottom rows) show predominantly blue coloring across all metrics, indicating uniformly poor outcomes, while upper-income tracts (top rows) display red/orange, signaling consistently strong performance. This isn't about one or two problem areas—disadvantage is comprehensive.
Looking across the rows, low-income tracts in Allegheny score poorly on employment (89.2%), education (7.5% Bachelor's+), median income ($34K), broadband (33.6%), and homeownership (34.6%). Upper-income tracts excel across every dimension: 96.7% employment, 23.3% Bachelor's+, $114.5K median income, and 75.5% homeownership. The color gradient from blue to red as you move up income levels shows this isn't a binary divide but a smooth progression.
Critically, and in continuation from 6.1, this pattern holds across all three geographic scales—Allegheny mirrors Pennsylvania mirrors nationwide. The consistency suggests these aren't problems unique to Pittsburgh's post-industrial transition but reflect systemic American inequities.
Policy Implication: Single-issue interventions won't suffice. A tract struggling with employment also faces education deficits, infrastructure gaps, and housing instability. Effective policy requires coordinated, multi-dimensional approaches that address economic opportunity holistically rather than treating symptoms in isolation.
6.3 Opportunity Gaps Across Key Economic Indicators¶
Description: 2×3 subplot grid showing grouped bar charts for six metrics (Employment Rate, Bachelor's Degree+, Broadband Access, Homeownership Rate, Unemployment Rate, Vacancy Rate), each comparing Low/Moderate vs Middle/Upper income tracts across three geographies.
Objective: Examine each dimension of opportunity independently while maintaining cross-geographic comparability, allowing detailed assessment of where gaps are largest and whether Allegheny County shows unique patterns relative to Pennsylvania and nationwide averages.
Methodology: For each of six key metrics, calculate mean values separately for Low/Moderate income tracts and Middle/Upper income tracts across three geographic scales (Allegheny County, Pennsylvania, Nationwide). Display side-by-side grouped bars showing Low/Moderate (red) and Middle/Upper (blue) performance for all three geographies within each subplot. Maintain consistent color scheme across all six panels to enable pattern recognition. Calculate percentage point gaps (Middle/Upper minus Low/Moderate) for each metric-geography combination. Arrange metrics in 2×3 grid with employment and opportunity indicators in the top row, and housing/stability indicators in the bottom row.
# 6.3 Bar chart comparison of key indicators by income group and geography
import plotly.graph_objects as go
from plotly.subplots import make_subplots
# Metrics to compare and their subplot titles
gap_metrics_subplot = [
("Employment Rate", "Employment Rate (%)"),
("Bachelors_Plus_Rate", "Bachelor's Degree+ (%)"),
("Broadband Rate", "Broadband Access (%)"),
("Homeownership Rate", "Homeownership Rate (%)"),
("Unemployment Rate", "Unemployment Rate (%)"),
("Vacancy Rate", "Vacancy Rate (%)"),
]
# Build comparison data for each metric
subplot_data = {}
for metric, label in gap_metrics_subplot:
metric_comparison = []
for geo_name, geo_data in [
("Allegheny County", allegheny),
("Pennsylvania", pennsylvania),
("Nationwide", nationwide),
]:
low_mod = geo_data[geo_data["Distress_Category"] == "Low/Moderate Income"][metric].mean()
mid_upper = geo_data[geo_data["Distress_Category"] == "Middle/Upper Income"][metric].mean()
gap = mid_upper - low_mod
metric_comparison.append(
{
"Geography": geo_name,
"Low/Moderate": low_mod,
"Middle/Upper": mid_upper,
"Gap": gap,
}
)
subplot_data[metric] = pd.DataFrame(metric_comparison)
# Create 2x3 grid of subplots
fig = make_subplots(
rows=2,
cols=3,
subplot_titles=[label for _, label in gap_metrics_subplot],
vertical_spacing=0.15,
horizontal_spacing=0.10,
)
positions = [(1, 1), (1, 2), (1, 3), (2, 1), (2, 2), (2, 3)]
# Add traces for each metric
for idx, ((metric, label), (row, col)) in enumerate(zip(gap_metrics_subplot, positions)):
df_metric = subplot_data[metric]
# Low/Moderate income bars
fig.add_trace(
go.Bar(
name="Low/Moderate Income",
x=df_metric["Geography"],
y=df_metric["Low/Moderate"],
marker_color="#e74c3c",
text=df_metric["Low/Moderate"].round(1),
texttemplate="%{text}",
textposition="outside",
textfont=dict(size=9),
showlegend=(idx == 0),
legendgroup="low_mod",
hovertemplate="<b>%{x}</b><br>Low/Moderate: %{y:.1f}%<extra></extra>",
),
row=row,
col=col,
)
# Middle/Upper income bars
fig.add_trace(
go.Bar(
name="Middle/Upper Income",
x=df_metric["Geography"],
y=df_metric["Middle/Upper"],
marker_color="#3498db",
text=df_metric["Middle/Upper"].round(1),
texttemplate="%{text}",
textposition="outside",
textfont=dict(size=9),
showlegend=(idx == 0),
legendgroup="mid_upper",
hovertemplate="<b>%{x}</b><br>Middle/Upper: %{y:.1f}%<extra></extra>",
),
row=row,
col=col,
)
# Axes styling per subplot
fig.update_yaxes(
title_text="%",
title_font=dict(size=10),
tickfont=dict(size=9),
gridcolor="#ecf0f1",
row=row,
col=col,
)
fig.update_xaxes(
tickfont=dict(size=9),
tickangle=-45,
row=row,
col=col,
)
# Overall layout
fig.update_layout(
title=dict(
text="Opportunity Gaps Across Key Economic Indicators",
x=0.5,
xanchor="center",
font=dict(size=22, family="Arial, sans-serif", color="#2c3e50"),
),
annotations=list(fig.layout.annotations)
+ [
dict(
text=(
"Comparison of Low/Moderate Income vs Middle/Upper Income census tracts "
"across Allegheny County, Pennsylvania, and the U.S."
),
x=0.5,
y=-0.08,
xref="paper",
yref="paper",
xanchor="center",
yanchor="top",
showarrow=False,
font=dict(size=11, color="#7f8c8d", family="Arial, sans-serif"),
)
],
height=800,
width=1200,
showlegend=True,
legend=dict(
orientation="h",
yanchor="bottom",
y=1.05,
xanchor="center",
x=0.5,
font=dict(size=12, family="Arial, sans-serif"),
bgcolor="rgba(255,255,255,0.8)",
bordercolor="#bdc3c7",
borderwidth=1,
),
template="plotly_white",
barmode="group",
margin=dict(t=140, b=100, l=60, r=60),
)
fig.show()
Interpretation:
While Allegheny County largely mirrors statewide and national patterns, a closer look reveals where local conditions differ—and where targeted intervention could make the biggest impact.
The education gap is severe everywhere, but Allegheny shows slightly better performance in upper-income tracts (23.3% Bachelor's+) compared to Pennsylvania (20.7%) and nationwide (20.8%). This likely reflects the concentration of universities and knowledge-economy jobs in Pittsburgh. However, low-income tracts still lag dramatically at just 11.2%, creating a local divide that's particularly stark given the region's educational assets.
Where Allegheny stands out most is broadband access. The gap here (6.6 percentage points) is smaller than Pennsylvania's or the nation's, but absolute rates are concerning—only 38% of low-income Allegheny residents have broadband versus 40% in middle/upper areas. In a region marketing itself as a tech hub, this digital divide directly undermines economic inclusion.
Unemployment and homeownership gaps track national averages, but the vacancy rate story is uniquely local. Allegheny's low-income tracts show 15.6% vacancy—higher than Pennsylvania (12.7%) or nationwide (11.9%)—a legacy of deindustrialization and population loss that continues to destabilize neighborhoods.
Policy Implication: Allegheny can't solve national structural inequities alone, but it can address local infrastructure gaps. Expanding broadband in low-income neighborhoods and stabilizing housing markets through anti-blight initiatives would directly target Allegheny's most distinctive challenges.
6.4 Relationship Between Median Income and Employment Rate¶
• Description: Scatter plot with trendline showing relationship between median household income (x-axis) and employment rate (y-axis) for all census tracts, color-coded by geography.
• Objective: Test the correlation between income and employment outcomes, examining whether higher-income areas systematically show stronger labor force participation.
• Methodology: Plot each census tract as a point with opacity to show density. Fit ordinary least squares (OLS) trendline across all tracts. Color points by geography (Allegheny = blue, Pennsylvania = purple, Nationwide = orange). Include hover details for individual tract identification.
# 6.4 Income–employment relationship across geographies
import plotly.express as px
import plotly.graph_objects as go
# Assemble scatter data for all three geographies
scatter_data = []
for geo_name, geo_data in [
("Allegheny County", allegheny),
("Pennsylvania", pennsylvania),
("Nationwide", nationwide),
]:
temp_df = geo_data[
[
"Median Household Income",
"Employment Rate",
"FFIEC Tract income level (2022)",
"Tract Name",
]
].copy()
temp_df["Geography"] = geo_name
scatter_data.append(temp_df)
scatter_df = pd.concat(scatter_data, ignore_index=True)
# Filter out outliers and invalid values
scatter_df = scatter_df[
(scatter_df["Median Household Income"] > 0)
& (scatter_df["Median Household Income"] < 250_000)
& (scatter_df["Employment Rate"] > 0)
& (scatter_df["Employment Rate"] < 100)
]
# Scatter plot with overall OLS trendline
fig = px.scatter(
scatter_df,
x="Median Household Income",
y="Employment Rate",
color="Geography",
color_discrete_map={
"Allegheny County": "#1f77b4",
"Pennsylvania": "#9467bd",
"Nationwide": "#ff7f0e",
},
opacity=0.3,
hover_data={
"Tract Name": True,
"FFIEC Tract income level (2022)": True,
"Median Household Income": ":$,.0f",
"Employment Rate": ":.1f",
"Geography": True,
},
labels={
"Median Household Income": "Median Household Income",
"Employment Rate": "Employment Rate (%)",
},
trendline="ols",
trendline_scope="overall",
trendline_color_override="#2c3e50",
)
# Emphasize the trendline
for trace in fig.data:
if getattr(trace, "mode", None) == "lines":
trace.line.width = 3
trace.line.dash = "dash"
fig.update_layout(
title=dict(
text=(
"Nation-Level Relationship Between Median Income and Employment Rate"
"<br><sub>Higher income tracts show stronger employment outcomes</sub>"
),
x=0.5,
xanchor="center",
font=dict(size=22, family="Arial, sans-serif", color="#2c3e50"),
),
xaxis=dict(
title="Median Household Income ($)",
title_font=dict(size=14, family="Arial, sans-serif"),
tickformat="$,.0f",
gridcolor="#ecf0f1",
tickfont=dict(size=11),
),
yaxis=dict(
title="Employment Rate (%)",
title_font=dict(size=14, family="Arial, sans-serif"),
gridcolor="#ecf0f1",
range=[70, 100],
tickfont=dict(size=11),
),
height=600,
width=1000,
template="plotly_white",
plot_bgcolor="white",
legend=dict(
title=dict(text="Geography", font=dict(size=13, family="Arial, sans-serif")),
font=dict(size=12, family="Arial, sans-serif"),
bgcolor="rgba(255,255,255,0.9)",
bordercolor="#bdc3c7",
borderwidth=1,
x=0.02,
y=0.98,
xanchor="left",
yanchor="top",
),
annotations=[
dict(
text="Dashed line shows overall trend across all census tracts",
x=0.5,
y=-0.15,
xref="paper",
yref="paper",
xanchor="center",
yanchor="top",
showarrow=False,
font=dict(size=11, color="#7f8c8d", family="Arial, sans-serif"),
)
],
margin=dict(t=120, b=100, l=80, r=50),
)
fig.show()
Interpretation:
The scatter plot confirms a clear positive relationship: as median household income rises, employment rates climb steadily. The upward trendline shows that for roughly every $50,000 increase in median income, employment rates gain approximately 5-7 percentage points. But the real story is in the spread.
At the lower end—tracts with median incomes below $50,000—employment rates scatter widely from 70% to 95%. This variation is critical: it proves that low income doesn't doom a community to poor employment outcomes. Some disadvantaged tracts achieve employment rates matching or exceeding wealthier areas, suggesting protective factors (strong local employers, workforce programs, transit access) can make a difference.
As income rises above $100,000, the scatter tightens. Nearly all affluent tracts cluster at 95-100% employment with minimal variation, indicating that wealth creates stability and consistent access to jobs. The densest concentration of points sits in the $40,000-$80,000 range at 85-95% employment—this is where most American communities live.
Geographically, Allegheny County (blue), Pennsylvania (purple), and nationwide (orange) tracts blend together seamlessly across the entire income spectrum, reinforcing that these dynamics aren't regionally unique.
Policy Implication: The wide variation among low-income tracts is encouraging—disadvantage isn't destiny. Identifying what differentiates high-performing low-income tracts from struggling ones could reveal replicable interventions that break the income-employment correlation.
6.5 Educational Attainment by Income Level¶
• Description: Three side-by-side stacked bar charts (one per geography) showing distribution of educational attainment (High School or Less, Some College, Associate Degree, Bachelor's+) across four income levels.
• Objective: Identify where the education gap is most pronounced and whether it's concentrated at specific credential levels or distributed across the full education spectrum.
• Methodology: Calculate percentage of population in each education category by income level and geography. Stack categories in consistent color order (red = less education → blue = more education). Display percentages within bars.
# 6.5 Educational attainment by income level and geography
import plotly.graph_objects as go
from plotly.subplots import make_subplots
education_data = []
# Build approximate education breakdown by income level
for geo_name, geo_data in [
("Allegheny County", allegheny),
("Pennsylvania", pennsylvania),
("Nationwide", nationwide),
]:
for income_level in ["Low", "Moderate", "Middle", "Upper"]:
subset = geo_data[geo_data["FFIEC Tract income level (2022)"] == income_level]
if len(subset) == 0:
continue
# Bachelor's+ share
bachelors_plus = subset["Bachelors_Plus_Rate"].mean()
# Associate degree share relative to population
associates = (subset["Associates Degree"] / subset["Total Population"] * 100).mean()
# Approximate "Some College"
some_college_estimate = associates * 1.5
# Remainder = high school or less
hs_or_less = 100 - bachelors_plus - associates - some_college_estimate
education_data.append(
{
"Geography": geo_name,
"Income Level": income_level,
"High School or Less": max(0, hs_or_less),
"Some College": some_college_estimate,
"Associate Degree": associates,
"Bachelor's+": bachelors_plus,
}
)
education_df = pd.DataFrame(education_data)
# Stacked bars for each geography
fig = make_subplots(
rows=1,
cols=3,
subplot_titles=["Allegheny County", "Pennsylvania", "Nationwide"],
horizontal_spacing=0.08,
)
colors = {
"High School or Less": "#e74c3c",
"Some College": "#e67e22",
"Associate Degree": "#f39c12",
"Bachelor's+": "#3498db",
}
education_categories = [
"High School or Less",
"Some College",
"Associate Degree",
"Bachelor's+",
]
for col_idx, geo_name in enumerate(["Allegheny County", "Pennsylvania", "Nationwide"], 1):
geo_subset = education_df[education_df["Geography"] == geo_name]
for edu_cat in education_categories:
fig.add_trace(
go.Bar(
name=edu_cat,
x=geo_subset["Income Level"],
y=geo_subset[edu_cat],
marker_color=colors[edu_cat],
showlegend=(col_idx == 1),
legendgroup=edu_cat,
text=geo_subset[edu_cat].round(1),
texttemplate="%{text:.0f}%",
textposition="inside",
textfont=dict(size=9, color="white"),
hovertemplate=(
"<b>%{x} Income</b><br>"
+ edu_cat
+ ": %{y:.1f}%<extra></extra>"
),
),
row=1,
col=col_idx,
)
fig.update_xaxes(
categoryorder="array",
categoryarray=["Low", "Moderate", "Middle", "Upper"],
tickfont=dict(size=11),
row=1,
col=col_idx,
)
fig.update_yaxes(
title_text="Percentage (%)" if col_idx == 1 else "",
title_font=dict(size=12),
tickfont=dict(size=10),
range=[0, 100],
gridcolor="#ecf0f1",
row=1,
col=col_idx,
)
fig.update_layout(
title=dict(
text="Educational Attainment by Income Level",
x=0.5,
xanchor="center",
font=dict(size=22, family="Arial, sans-serif", color="#2c3e50"),
),
barmode="stack",
height=550,
width=1300,
template="plotly_white",
legend=dict(
title=dict(text="Education Level", font=dict(size=13)),
font=dict(size=12, family="Arial, sans-serif"),
orientation="h",
yanchor="bottom",
y=1.1,
xanchor="center",
x=0.5,
bgcolor="rgba(255,255,255,0.9)",
bordercolor="#bdc3c7",
borderwidth=1,
),
annotations=list(fig.layout.annotations)
+ [
dict(
text=(
"Lower-income tracts show higher shares of red/orange (lower attainment), "
"while upper-income tracts show more blue (Bachelor's+)."
),
x=0.5,
y=-0.12,
xref="paper",
yref="paper",
xanchor="center",
yanchor="top",
showarrow=False,
font=dict(size=11, color="#7f8c8d"),
)
],
margin=dict(t=150, b=100, l=70, r=50),
)
fig.show()
print("✓ Education attainment visualization created!")
✓ Education attainment visualization created!
Interpretation:
The education divide isn't gradual—it's a cliff. In low-income tracts across all three geographies, 77-83% of residents have a high school education or less (red dominance). This flips entirely in upper-income tracts where Bachelor's degrees or higher (blue) account for 21-23% of the population, while high school-or-less drops to 62-65%.
What's striking is where the gap isn't. "Some College" (yellow/orange) and "Associate Degree" (orange) categories show remarkably little variation across income levels—hovering around 8-13% regardless of tract wealth. This suggests that starting college or earning an Associate's degree doesn't strongly predict economic mobility. The critical threshold is the Bachelor's degree.
Allegheny County shows a slightly sharper education gradient than Pennsylvania or nationwide. Upper-income Allegheny tracts have 23% Bachelor's+ attainment compared to 21% statewide and nationally, likely reflecting Pittsburgh's concentration of universities and professional employers. But low-income Allegheny tracts mirror the national pattern at 77% high school or less—the region's educational assets aren't reaching disadvantaged communities.
The middle and moderate-income categories (center two bars) show intermediate patterns, confirming this is a spectrum rather than a binary. But the Low-to-Upper contrast is stark: you're looking at nearly inverted educational profiles.
Policy Implication: Associate's degrees and "some college" aren't closing opportunity gaps. If education policy aims to improve economic mobility, the focus must be on Bachelor's degree completion—not just college access.
6.6 Infrastructure Access Gaps in Allegheny County¶
• Description: Grouped bar chart comparing Broadband Access and Public Transit Use across four income levels, focused exclusively on Allegheny County tracts.
• Objective: Examine two critical infrastructure dimensions—digital access (broadband) and physical mobility (transit)—to assess whether infrastructure gaps correlate with income classification.
• Methodology: Calculate mean Broadband Access Rate and Transit Commute Rate for each income level within Allegheny County. Display as grouped bars with Broadband (blue) and Transit (orange) side-by-side for each income category.
# 6.6 Infrastructure access gaps within Allegheny County
import plotly.graph_objects as go
# Aggregate broadband and transit use by income level (Allegheny only)
infrastructure_data = []
for income_level in ["Low", "Moderate", "Middle", "Upper"]:
subset = allegheny[allegheny["FFIEC Tract income level (2022)"] == income_level]
if len(subset) == 0:
continue
infrastructure_data.append(
{
"Income Level": income_level,
"Broadband Access": subset["Broadband Rate"].mean(),
"Public Transit Use": subset["Transit Rate"].mean(),
"n_tracts": len(subset),
}
)
infra_df = pd.DataFrame(infrastructure_data)
# Grouped bar chart: broadband vs transit by income level
fig = go.Figure()
fig.add_trace(
go.Bar(
name="Broadband Access",
x=infra_df["Income Level"],
y=infra_df["Broadband Access"],
marker_color="#3498db",
text=infra_df["Broadband Access"].round(1).astype(str) + "%",
textposition="outside",
textfont=dict(size=11),
hovertemplate=(
"<b>%{x} Income</b><br>"
"Broadband Access: %{y:.1f}%<extra></extra>"
),
)
)
fig.add_trace(
go.Bar(
name="Public Transit Use",
x=infra_df["Income Level"],
y=infra_df["Public Transit Use"],
marker_color="#e67e22",
text=infra_df["Public Transit Use"].round(1).astype(str) + "%",
textposition="outside",
textfont=dict(size=11),
hovertemplate=(
"<b>%{x} Income</b><br>"
"Public Transit Use: %{y:.1f}%<extra></extra>"
),
)
)
fig.update_layout(
title=dict(
text="Infrastructure Access Gaps in Allegheny County",
x=0.5,
xanchor="center",
font=dict(size=22, family="Arial, sans-serif", color="#2c3e50"),
),
xaxis=dict(
title="Income Level",
title_font=dict(size=14),
categoryorder="array",
categoryarray=["Low", "Moderate", "Middle", "Upper"],
tickfont=dict(size=12),
),
yaxis=dict(
title="Percentage (%)",
title_font=dict(size=14),
tickfont=dict(size=11),
gridcolor="#ecf0f1",
range=[
0,
max(
infra_df["Broadband Access"].max(),
infra_df["Public Transit Use"].max(),
)
+ 10,
],
),
barmode="group",
height=550,
width=900,
template="plotly_white",
plot_bgcolor="white",
legend=dict(
title=dict(text="Infrastructure Type", font=dict(size=13)),
font=dict(size=12, family="Arial, sans-serif"),
orientation="h",
yanchor="bottom",
y=1.08,
xanchor="center",
x=0.5,
bgcolor="rgba(255,255,255,0.9)",
bordercolor="#bdc3c7",
borderwidth=1,
),
annotations=[
dict(
text=(
"Digital access rises with income, while transit use is highest "
"in lower-income areas."
),
x=0.5,
y=-0.15,
xref="paper",
yref="paper",
xanchor="center",
yanchor="top",
showarrow=False,
font=dict(size=11, color="#7f8c8d", family="Arial, sans-serif"),
)
],
margin=dict(t=130, b=100, l=80, r=50),
)
fig.show()
# Simple numeric summary for the narrative
print("Infrastructure gap summary (Allegheny County):")
broadband_low = infra_df.loc[infra_df["Income Level"] == "Low", "Broadband Access"].values[0]
broadband_upper = infra_df.loc[infra_df["Income Level"] == "Upper", "Broadband Access"].values[0]
transit_low = infra_df.loc[infra_df["Income Level"] == "Low", "Public Transit Use"].values[0]
transit_upper = infra_df.loc[infra_df["Income Level"] == "Upper", "Public Transit Use"].values[0]
print(f"- Broadband gap (Upper - Low): {broadband_upper - broadband_low:.1f} percentage points")
print(f"- Transit use: higher in {'lower' if transit_low > transit_upper else 'upper'}-income areas")
Infrastructure gap summary (Allegheny County): - Broadband gap (Upper - Low): 5.8 percentage points - Transit use: higher in lower-income areas
Interpretation:
Allegheny County faces a clear digital divide: only 33.6% of low-income residents have broadband access compared to 39.3% in upper-income tracts—a gap of nearly 6 percentage points. While this might seem modest, it translates to thousands of households locked out of remote work, telehealth, online education, and essential digital services. Even the highest rate (41.7% in middle-income tracts) is alarmingly low, suggesting broadband infrastructure lags across the county.
Transit usage tells the opposite story, revealing dependency rather than access. Low-income tracts show 7.6% transit commuting—nearly triple the 2.6% rate in upper-income areas. This pattern reflects necessity, not preference: residents in disadvantaged neighborhoods rely on public transit because they lack vehicle access, while affluent residents drive. The drop from 7.6% (low) to 5.3% (moderate) to 3.6% (middle) to 2.6% (upper) shows a clear inverse relationship between income and transit dependency.
The combination is problematic. Low-income residents depend heavily on transit for basic mobility but lack the broadband access needed for flexible work arrangements or digital job applications. This creates compounding barriers—if you can't work remotely due to poor internet, you're forced into transit-dependent jobs, limiting employment options to routes the bus serves.
Policy Implication: Infrastructure investments should be bundled. Expanding broadband in low-income neighborhoods while simultaneously improving transit frequency and coverage would address complementary mobility barriers. One without the other leaves gaps unfilled.
6.7 Income Classification by Census Tract - Allegheny County¶
• Description: Interactive choropleth map of Allegheny County showing FFIEC income classification (Low, Moderate, Middle, Upper) for each census tract with census boundaries visible.
• Objective: Visualize the geographic distribution of income levels to identify spatial clustering patterns and test whether low-income tracts concentrate in specific areas of the county.
• Methodology: Merge census tract shapefiles with income classification data using 11-digit GEOID. Color-code tracts by income level (Red = Low, Orange = Moderate, Yellow = Middle, Blue = Upper). Enable hover tooltips showing tract details.
# 6.7 Choropleth: income classification by census tract (Allegheny County)
import geopandas as gpd
import plotly.express as px
import plotly.graph_objects as go
# Path to Pennsylvania tract shapefile (2022)
shapefile_path = "/Users/sofiahutton/Documents/Fall 2025 CMU Classes/visualizations with python /tl_2022_42_tract.shp"
# Load tracts and filter to Allegheny County (county FIPS = 003)
gdf = gpd.read_file(shapefile_path)
gdf_allegheny = gdf[gdf["COUNTYFP"] == "003"].copy()
# Build GEOID for merge
gdf_allegheny["GEOID_full"] = gdf_allegheny["GEOID"]
allegheny_for_map = allegheny.copy()
allegheny_for_map["GEOID_full"] = (
allegheny_for_map["state"].astype(str).str.zfill(2)
+ allegheny_for_map["county"].astype(str).str.zfill(3)
+ allegheny_for_map["Tract Code (6-digit)"].astype(str).str.zfill(6)
)
# Merge shapefile with tract-level indicators
gdf_merged = gdf_allegheny.merge(
allegheny_for_map[
[
"GEOID_full",
"FFIEC Tract income level (2022)",
"Tract Name",
"Employment Rate",
"Median Household Income",
"Bachelors_Plus_Rate",
]
],
on="GEOID_full",
how="left",
)
# Income-level colors
color_map = {
"Low": "#e74c3c",
"Moderate": "#e67e22",
"Middle": "#f39c12",
"Upper": "#3498db",
"Unknown": "#95a5a6",
}
# Fill missing classifications
gdf_merged["FFIEC Tract income level (2022)"] = gdf_merged[
"FFIEC Tract income level (2022)"
].fillna("Unknown")
# Center map around tract centroids
center_lat = gdf_merged.geometry.centroid.y.mean()
center_lon = gdf_merged.geometry.centroid.x.mean()
fig = px.choropleth_mapbox(
gdf_merged,
geojson=gdf_merged.geometry,
locations=gdf_merged.index,
color="FFIEC Tract income level (2022)",
color_discrete_map=color_map,
category_orders={
"FFIEC Tract income level (2022)": ["Low", "Moderate", "Middle", "Upper", "Unknown"]
},
mapbox_style="carto-positron",
center={"lat": center_lat, "lon": center_lon},
zoom=9,
opacity=0.7,
hover_data={
"Tract Name": True,
"FFIEC Tract income level (2022)": True,
"Employment Rate": ":.1f",
"Median Household Income": ":$,.0f",
"Bachelors_Plus_Rate": ":.1f",
},
labels={
"FFIEC Tract income level (2022)": "Income Level",
"Employment Rate": "Employment Rate (%)",
"Median Household Income": "Median Income",
"Bachelors_Plus_Rate": "Bachelor's Degree+ (%)",
},
)
fig.update_layout(
title=dict(
text="Income Classification by Census Tract<br><sub>Allegheny County, Pennsylvania</sub>",
x=0.5,
xanchor="center",
font=dict(size=24, family="Arial, sans-serif", color="#2c3e50"),
),
margin=dict(r=0, t=80, l=0, b=0),
height=700,
legend=dict(
title=dict(text="FFIEC Income Level", font=dict(size=14)),
font=dict(size=12, family="Arial, sans-serif"),
bgcolor="rgba(255,255,255,0.9)",
bordercolor="#bdc3c7",
borderwidth=2,
x=0.02,
y=0.98,
xanchor="left",
yanchor="top",
),
)
fig.show()